Distracted driving and fatigued drivers are significant causes of fatalities on the road worldwide, causing between 20-30% of all deadly accidents. Current Advanced Driver Assistance Systems (ADAS) analyse driver and/or road information independently, missing the vital link between the two. This work proposes an IDRMS approach that monitors facial behavior and road scenes concurrently, fusing both data streams via a context-aware risk assessment engine. Facial data collection was conducted using MediaPipe FaceMesh (468 landmarks) for Eye Aspect Ratio (EAR), Mouth Aspect Ratio (MAR), Head Pose 6-Degrees of Freedom (6-DoF) calculated using PnP, and the 2-D gaze vector estimation from eye irises. Two lightweight LSTM networks (named HazardLSTM and CollisionLSTM), which can be entirely performed using NumPy arrays with online gradient descent calculations, were used to estimate more accurate hazards and collision probabilities in each frame. A priority-based fusion engine mapped the risk scores obtained from both streams onto four levels (SAFE, CAUTION, WARNING, CRITICAL) and generated textual alerts together with beep and TTS outputs. The proposed approach operates at 20-30 FPS (CPU, OpenCV-only mode, and v11 mode). Drowsiness sensitivity was 89% (92% specificity), gaze-away detection accuracy was 85%, the LSTM-based hazard classifier reached 91% accuracy (less than 4% of false positives), while the context-aware fusion reduced CRITICAL alerting spuriousness by 47%.
Introduction
This paper presents the Integrated Driver and Road Monitoring System (IDRMS), also called Fusion Drive AI, a real-time intelligent safety system that combines driver monitoring and road scene analysis to reduce traffic accidents caused by drowsiness, distraction, and hazardous driving conditions. Since many severe road accidents are linked to driver-related factors such as fatigue and inattention, the system aims to provide context-aware risk assessment by considering both the driver's condition and the surrounding road environment.
The key innovation of IDRMS is its ability to simultaneously analyze driver behavior and road conditions, then fuse both sources of information into a unified hazard assessment. The system introduces an event-driven Fatigue Score that measures driver fatigue using microsleep events, yawns, and prolonged eye closures. Unlike traditional PERCLOS-based methods, it operates effectively at lower frame rates (10–30 FPS). Additionally, two lightweight LSTM neural networks, implemented entirely in NumPy without deep learning frameworks, provide real-time hazard classification and collision probability prediction based on temporal driving context.
The architecture consists of four main modules:
Driver Monitoring Module
Uses MediaPipe FaceMesh for facial landmark detection.
Computes:
Eye Aspect Ratio (EAR) for drowsiness detection.
Mouth Aspect Ratio (MAR) for yawn detection.
Head pose estimation using OpenCV’s solvePnP.
Gaze direction tracking through iris landmarks.
Generates a fatigue score ranging from 0–10, where higher values indicate increasing fatigue severity.
Road Monitoring Module
Detects lanes using Canny edge detection and Hough Transform.
Detects vehicles, pedestrians, and obstacles using YOLOv11 or lightweight OpenCV-based methods.
Estimates object distances and road hazards.
Uses two LSTM models:
HazardLSTM for hazard level prediction.
CollisionLSTM for short-term collision probability estimation.
Fusion Engine
Combines driver risk and road risk into a unified danger score.
Applies context-aware multipliers so that risks amplify each other.
For example, a drowsy driver in heavy traffic is considered significantly more dangerous than either risk alone.
Alert System
Prioritizes alerts based on severity.
Includes warnings for pedestrian crossings, potential collisions, microsleep events, prolonged gaze diversion, head rotation, yawning, and road hazards.
Prevents overlapping or redundant alerts.
Experimental evaluation showed strong performance across multiple tasks:
Subsystem
Accuracy
Drowsiness Detection
91.3%
Yawn Detection
93.8%
Gaze Detection
87.1%
Head Pose Detection
89.1%
Lane Detection
82.3%
Vehicle Detection (YOLOv11)
78.6% mAP
Hazard Classification (LSTM)
91.4%
Collision Prediction
AUC = 0.938
The fusion-based approach significantly outperformed driver-only and road-only systems:
Alert Precision: 84.2%
Alert Recall: 91.7%
False Critical Alerts: Reduced to 15.8%
Compared to additive fusion methods, the proposed multiplicative context-aware fusion achieved higher precision and recall while reducing false alarms. The LSTM-based temporal smoothing also reduced hazard oscillations by 73.8%, improving stability and minimizing unnecessary alerts.
A major advantage of IDRMS is that it runs entirely on a CPU without requiring GPUs, cloud services, or heavy deep-learning frameworks, making it practical for low-cost deployment while preserving privacy. Its modular design allows components such as object detectors or fatigue detectors to be upgraded independently.
However, the system has limitations, including reduced lane detection performance in poor weather or low-light conditions, support for only one driver at a time, and reliance on monocular cameras for approximate distance estimation. Future improvements could include stereo vision, LiDAR integration, and support for multiple occupants.
Conclusion
In summary, Fusion Drive AI was described, which is an innovative framework of real-time safety monitoring for ADAS that unifies drowsy-driver facial state analysis with road scene interpretation in a context-aware manner using fusion. Fusion Drive achieves drowsiness sensitivity of 89.2%, LSTM hazard classification accuracy of 91.4%, and fused alert precision of 84.2% at 22 frames per second on a common CPU without GPU assistance. Event-driven Fatigue Score, LSTM-based temporal smoothing, and priority-ranked fusion algorithm resolve many of the problems with earlier multimodal and unimodal ADAS approaches.
In future works, we aim to advance along five tracks. First, incorporation of stereo camera or monocular depth estimator into IDRMS to calculate metric collision time-tocontact instead of normalised distance. Secondly, nighttime and adverse weather robustness by adding nearinfrared lighting and domain adaptive image enhancement. Thirdly, adding GPS and digital maps for roadclassification context (urban/motorway) information to be considered by the fusion component. Fourthly, scaling up the solution to fleets by aggregating session statistics in anonymized fashion via cloud infrastructure. Lastly, the fusion component can use adaptive weighting based on some form of learning, thus making it more flexible for the situation.
References
[1] World Health Organization, \"Global Status Report on Road Safety 2023,\" WHO Press, Geneva, 2023.
[2] National Highway Traffic Safety Administration, \"Traffic Safety Facts: Drowsy Driving,\" NHTSA Report DOT HS 812 764, 2021.
[3] D. F. Dinges and R. Grace, \"PERCLOS: A Valid Psychophysiological Measure of Alertness as Assessed by Psychomotor Vigilance,\" Federal Highway Administration Tech. Rep., 1998.
[4] T. Soukupová and J. ?ech, \"Real-Time Eye Blink Detection using Facial Landmarks,\" in Proc. 21st Computer Vision Winter Workshop (CVWW), 2016, pp. 1–8.
[5] M. Dua, S. Singla, S. Raj, and A. K. Jangra, \"Deep CNN ModelsBased Ensemble Approach to Driver Drowsiness Detection,\" Neural Comput. Appl., vol. 33, pp. 3155–3168, 2021.
[6] B. Reddy, Y.-H. Kim, S. Yun, C. Seo, and J. Jang, \"Real-Time Driver Drowsiness Detection for Embedded System Using Model Compression of Deep Neural Networks,\" in Proc. IEEE CVPR Workshops, 2017, pp. 438–445.
[7] A. Jamson, F. Westerhuis, O. Michon, and N. Merat, \"Identifying Drowsiness in Drivers Using a Predictive Algorithm Incorporating Visual and Physiological Metrics,\" Accid. Anal. Prev., vol. 126, pp. 118–124, 2019.
[8] C. Lugaresi et al., \"MediaPipe: A Framework for Perception Pipelines,\" arXiv:1906.08172, 2019.
[9] J. Illingworth and J. Kittler, \"A Survey of the Hough Transform,\" Comput. Vis. Graph. Image Process., vol. 44, no. 1, pp. 87–116, 1988.
[10] G. Jocher, A. Chaurasia, and J. Qiu, \"Ultralytics YOLOv11,\" GitHub, 2024. [Online]. Available: https://github.com/ultralytics/ultralytics [11] S. Hochreiter and J. Schmidhuber, \"Long Short-Term Memory,\" Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997. [12] X. Altché and A. de La Fortelle, \"An LSTM Network for Highway Trajectory Prediction,\" in Proc. IEEE ITSC, 2017, pp. 353–359.
[11] Y. Liang, M. L. Reyes, and J. D. Lee, \"Real-Time Detection of Driver Cognitive Distraction Using Support Vector Machines,\" IEEE Trans. Intell. Transp. Syst., vol. 8, no. 2, pp. 340–350, 2007.
[12] A. Kashevnik, I. Lashkov, and A. Gurtov, \"Methodology and Mobile Application for Driver Behavior Analysis and Accident Prevention,\" IEEE Trans. Intell. Transp. Syst., vol. 21, no. 6, pp. 2427–2436, 2020.
[13] J. D. Lee, D. V. McGehee, T. L. Brown, and M. L. Reyes, \"Collision Warning Timing, Driver Distraction, and Driver Response to Imminent Rear-End Collisions,\" Human Factors, vol. 44, no. 2, pp. 314–334, 2002. [16] Z. Chen, J. Wang, and H. Deng, \"A Survey of Lane Detection and Tracking Methods for Autonomous Driving,\" IEEE Access, vol. 9, pp. 21–36, 2021.
[14] Z. Zhang, R. Wang, and C. Fang, \"Driver Drowsiness Detection Based on MobileNet and Transfer Learning,\" IEEE Access, vol. 9, pp. 101–109, 2021.
[15] R. Hartley and A. Zisserman, \"Multiple View Geometry in Computer Vision,\" Cambridge University Press.